了解工作需求的演变对于工人,公司和公共组织遵循就业市场的快速转型而变得越来越重要。幸运的是,最近的自然语言处理(NLP)方法允许开发方法以自动从工作广告中提取信息并更精确地识别技能。但是,这些有效的方法需要从研究领域的大量注释数据,这些数据很难访问,这主要是由于知识产权。本文提出了一个新的公共数据集fijo,其中包含保险工作优惠,包括许多软技能注释。为了了解该数据集的潜力,我们详细介绍了一些特征和一些局限性。然后,我们使用命名的实体识别方法介绍了技能检测算法的结果,并表明基于变形金刚的模型在此数据集上具有良好的令牌性能。最后,我们分析了我们最佳模型犯的一些错误,以强调应用NLP方法时可能出现的困难。
translated by 谷歌翻译
深度强化学习使用深层神经网络来编码一项策略,该策略在广泛的应用程序中取得了很好的性能,但被广泛视为黑匣子模型。神经模糊的控制器给出了更可解释的深网替代品。不幸的是,神经模糊的控制器通常需要大量规则来解决相对简单的任务,从而难以解释。在这项工作中,我们提出了一种算法,将政策从深Q网络提取为紧凑的神经模糊控制器。这使我们能够通过蒸馏来训练紧凑的神经模糊控制器,以解决他们无法直接解决的任务,结合了深度强化学习的灵活性和紧凑的规则基础的可解释性。我们在OpenAI体育馆的三个知名环境中演示了算法,在那里我们仅使用2至6个模糊规则匹配DQN代理的性能。
translated by 谷歌翻译
由于其强大的理论属性,Shapley的价值已经变得非常流行,以解释黑匣子模型做出的预测。不幸的是,大多数计算沙普利值的现有技术在计算上非常昂贵。我们提出了PDD-shap,这是一种使用基于ANOVA的功能分解模型来近似所解释的黑框模型的算法。这使我们能够比大型数据集的现有方法快地计算出Shapley值的数量级,从而大大降低了计算Shapley值的摊销成本,当需要解释许多预测时。
translated by 谷歌翻译
近年来,超级人性药物的研究与发展取得了显着发展,各种军事和商业应用程序越来越多。几个国家的公共和私人组织一直在投资超人员,旨在超越其竞争对手并确保/提高战略优势和威慑。对于这些组织而言,能够及时可靠地识别新兴技术至关重要。信息技术的最新进展使得分析大量数据,提取隐藏的模式并为决策者提供新的见解。在这项研究中,我们专注于2000 - 2020年期间有关高人物的科学出版物,并采用自然语言处理和机器学习来通过识别12个主要潜在研究主题并分析其时间演变来表征研究格局。我们的出版物相似性分析揭示了在研究二十年中表明周期的模式。该研究对研究领域进行了全面的分析,以及研究主题是算法提取的事实,可以从练习中删除主观性,并可以在主题和时间间隔之间进行一致的比较。
translated by 谷歌翻译
在现实世界应用中,推理不完整的知识,传感,时间概念和数字约束的能力至关重要。尽管几个AI计划者能够处理其中一些要求,但它们主要限于特定类型的约束问题。本文提出了一种新的计划方法,该方法将临时计划构建结合在时间计划框架中,提供考虑数字约束和不完整知识的解决方案。我们建议对计划域定义语言(PDDL)进行较小的扩展,以模型(i)不完整,(ii)通过未知命题进行操作的知识传感动作,以及(iii)非确定性感应效应的可能结果。我们还引入了一组新的计划域来评估我们的求解器,该求解器在各种问题上表现出良好的性能。
translated by 谷歌翻译
无监督的特征学习通常会发现捕获复杂数据结构的低维嵌入。对于专家的任务可获得专家,将其纳入学习的代表可能会导致更高质量的嵌入品。例如,这可以帮助人们将数据嵌入给定的簇数,或者容纳阻止一个人直接在模型上衍生数据分布的噪声,然后可以更有效地学习。然而,缺乏将不同的先前拓扑知识集成到嵌入中的一般工具。虽然最近已经开发了可微分的拓扑层,但可以(重新)形状嵌入预定的拓扑模型,他们对代表学习有两个重要的局限性,我们在本文中解决了这一点。首先,目前建议的拓扑损失未能以自然的方式代表诸如群集和耀斑的简单模型。其次,这些损失忽略了对学习有用的数据中的所有原始结构(例如邻域)信息。我们通过引入一组新的拓扑损失来克服这些限制,并提出其用法作为拓扑正规规范数据嵌入来自然代表预定模型的一种方法。我们包括彻底的综合和实际数据实验,突出了这种方法的有用性和多功能性,其中应用范围从建模高维单胞胎数据进行建模到绘图嵌入。
translated by 谷歌翻译
数据点之间的距离被广泛应用于机器学习。然而,当被噪声干扰,这些距离 - 因而基于他们的模型 - 可能会失去在高维其效用。事实上,噪音小边际效应可能随后迅速积累,从地面实况移经验最近,最远的邻居了。在本文中,我们精确地使用渐近概率表达式表征在嘈杂的高维数据这样的效果。此外,尽管先前已经指出,当距离集中发生邻里查询变得毫无意义且不稳定,这意味着在数据最远和最近的邻居之间的差相对的歧视,我们认为这不一定是当我们分解的情况下在一个地面实况数据 - 这是我们的目标是回收 - 和噪声分量。更具体地说,我们推导出特定的条件下,受噪声影响的实证邻里关系仍可能即使距离集中发生是真实的。我们包括我们的结果的透彻实证检验,以及有趣的实验中,我们的推导相移,其中邻居成为随机的或不被证明是相同的相移,其中常见的降维的方法不佳或井执行用于回收低维重建的密集噪声高维数据。
translated by 谷歌翻译
In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.
translated by 谷歌翻译
Non-linear state-space models, also known as general hidden Markov models, are ubiquitous in statistical machine learning, being the most classical generative models for serial data and sequences in general. The particle-based, rapid incremental smoother PaRIS is a sequential Monte Carlo (SMC) technique allowing for efficient online approximation of expectations of additive functionals under the smoothing distribution in these models. Such expectations appear naturally in several learning contexts, such as likelihood estimation (MLE) and Markov score climbing (MSC). PARIS has linear computational complexity, limited memory requirements and comes with non-asymptotic bounds, convergence results and stability guarantees. Still, being based on self-normalised importance sampling, the PaRIS estimator is biased. Our first contribution is to design a novel additive smoothing algorithm, the Parisian particle Gibbs PPG sampler, which can be viewed as a PaRIS algorithm driven by conditional SMC moves, resulting in bias-reduced estimates of the targeted quantities. We substantiate the PPG algorithm with theoretical results, including new bounds on bias and variance as well as deviation inequalities. Our second contribution is to apply PPG in a learning framework, covering MLE and MSC as special examples. In this context, we establish, under standard assumptions, non-asymptotic bounds highlighting the value of bias reduction and the implicit Rao--Blackwellization of PPG. These are the first non-asymptotic results of this kind in this setting. We illustrate our theoretical results with numerical experiments supporting our claims.
translated by 谷歌翻译
Machine Reading Comprehension has become one of the most advanced and popular research topics in the fields of Natural Language Processing in recent years. The classification of answerability questions is a relatively significant sub-task in machine reading comprehension; however, there haven't been many studies. Retro-Reader is one of the studies that has solved this problem effectively. However, the encoders of most traditional machine reading comprehension models in general and Retro-Reader, in particular, have not been able to exploit the contextual semantic information of the context completely. Inspired by SemBERT, we use semantic role labels from the SRL task to add semantics to pre-trained language models such as mBERT, XLM-R, PhoBERT. This experiment was conducted to compare the influence of semantics on the classification of answerability for the Vietnamese machine reading comprehension. Additionally, we hope this experiment will enhance the encoder for the Retro-Reader model's Sketchy Reading Module. The improved Retro-Reader model's encoder with semantics was first applied to the Vietnamese Machine Reading Comprehension task and obtained positive results.
translated by 谷歌翻译